Skip to content

Conversation

aokblast
Copy link
Contributor

@aokblast aokblast commented Oct 7, 2025

In ELF file, there is a possible extended header for those phnum, shnum,
and shstrndx larger than the maximum of 16 bits. This extended header
use section 0 to record these fields in 32 bits.

We implment this feature so that programs rely on ELFFile::program_headers() can get the
correct number of segments. Also, the consumers don't have to check the
section 0 themselve, insteead, they can use the getPhNum() as an
alternative.

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2025

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-lld

Author: None (aokblast)

Changes

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. For most of the ELF writers like lld, we already have the mechanism to synthesis this special section 0. However, the parser part don't have such infra and therefore we add it.

Also, we modify some test cases. For those expected-error test cases, their error emission get early. For those expected-correct test cases, we modify the output since we support more than 65535 sections now.


Patch is 21.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162288.diff

8 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (-1)
  • (modified) lld/ELF/Writer.cpp (+10-1)
  • (modified) llvm/include/llvm/Object/ELF.h (+43-10)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+5)
  • (modified) llvm/test/Object/invalid.test (+2-2)
  • (modified) llvm/test/tools/llvm-objcopy/ELF/many-sections.test (+1-1)
  • (modified) llvm/test/tools/llvm-readobj/ELF/file-headers.test (+63-60)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+19-13)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index bbf4b29a9fda58..71294782d9a31d 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -4428,7 +4428,6 @@ void elf::writeEhdr(Ctx &ctx, uint8_t *buf, Partition &part) {
   eHdr->e_version = EV_CURRENT;
   eHdr->e_flags = ctx.arg.eflags;
   eHdr->e_ehsize = sizeof(typename ELFT::Ehdr);
-  eHdr->e_phnum = part.phdrs.size();
   eHdr->e_shentsize = sizeof(typename ELFT::Shdr);
 
   if (!ctx.arg.relocatable) {
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 4fa80397cbfa74..713d8279aab541 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2907,7 +2907,8 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
   // the value. The sentinel values and fields are:
   // e_shnum = 0, SHdrs[0].sh_size = number of sections.
   // e_shstrndx = SHN_XINDEX, SHdrs[0].sh_link = .shstrtab section index.
-  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_shoff);
+  // e_phnum = 0xFFFF, SHdrs[0]
+  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_smhoff);
   size_t num = ctx.outputSections.size() + 1;
   if (num >= SHN_LORESERVE)
     sHdrs->sh_size = num;
@@ -2922,6 +2923,14 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
     eHdr->e_shstrndx = strTabIndex;
   }
 
+  num = part.phdrs.size();
+  if (num >= 0xFFFF) {
+    eHdr->e_phnum = 0xFFFF;
+    sHdrs->sh_info = num;
+  } else {
+    eHdr->e_phnum = num;
+  }
+
   for (OutputSection *sec : ctx.outputSections)
     sec->writeHeaderTo<ELFT>(++sHdrs);
 }
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 59f63eb6b5bb69..4374924371d1cd 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -278,9 +278,17 @@ class ELFFile {
   std::vector<Elf_Shdr> FakeSections;
   SmallString<0> FakeSectionStrings;
 
+  // Handle extended header in section 0
+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;
+
   ELFFile(StringRef Object);
 
 public:
+  const Elf_Word getPhNum() const { return e_phnum; }
+  const Elf_Word getShNum() const { return e_shnum; }
+  const Elf_Word getShStrNdx() const { return e_shstrndx; }
   const Elf_Ehdr &getHeader() const {
     return *reinterpret_cast<const Elf_Ehdr *>(base());
   }
@@ -379,22 +387,21 @@ class ELFFile {
 
   /// Iterate over program header table.
   Expected<Elf_Phdr_Range> program_headers() const {
-    if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
+    if (e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
       return createError("invalid e_phentsize: " +
                          Twine(getHeader().e_phentsize));
 
-    uint64_t HeadersSize =
-        (uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
+    uint64_t HeadersSize = (uint64_t)e_phnum * getHeader().e_phentsize;
     uint64_t PhOff = getHeader().e_phoff;
     if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
       return createError("program headers are longer than binary of size " +
                          Twine(getBufSize()) + ": e_phoff = 0x" +
                          Twine::utohexstr(getHeader().e_phoff) +
-                         ", e_phnum = " + Twine(getHeader().e_phnum) +
+                         ", e_phnum = " + Twine(e_phnum) +
                          ", e_phentsize = " + Twine(getHeader().e_phentsize));
 
     auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
-    return ArrayRef(Begin, Begin + getHeader().e_phnum);
+    return ArrayRef(Begin, Begin + e_phnum);
   }
 
   /// Get an iterator over notes in a program header.
@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                      WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;
   if (Index == ELF::SHN_XINDEX) {
     // If the section name string table section index is greater than
     // or equal to SHN_LORESERVE, then the actual index of the section name
@@ -889,7 +896,12 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
   return 0;
 }
 
-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();
+  e_phnum = Header.e_phnum;
+  e_shnum = Header.e_shnum;
+  e_shstrndx = Header.e_shstrndx;
+}
 
 template <class ELFT>
 Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
@@ -897,7 +909,29 @@ Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
     return createError("invalid buffer: the size (" + Twine(Object.size()) +
                        ") is smaller than an ELF header (" +
                        Twine(sizeof(Elf_Ehdr)) + ")");
-  return ELFFile(Object);
+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the
+  // extended headers
+  //
+  if (Result.getHeader().HasHeaderExtension()) {
+    auto TableOrErr = Result.sections();
+    if (!TableOrErr)
+      return TableOrErr.takeError();
+    if ((*TableOrErr).size() == 0)
+      return Result;
+    auto SecOrErr = object::getSection<ELFT>(*TableOrErr, 0);
+    if (!SecOrErr)
+      return SecOrErr.takeError();
+    if (Result.e_phnum == 0xFFFF)
+      Result.e_phnum = (*SecOrErr)->sh_info;
+    if (Result.e_shnum == ELF::SHN_UNDEF)
+      Result.e_shnum = (*SecOrErr)->sh_size;
+    if (Result.e_shstrndx == ELF::SHN_XINDEX)
+      Result.e_shstrndx = (*SecOrErr)->sh_link;
+  }
+  return Result;
 }
 
 /// Used by llvm-objdump -d (which needs sections for disassembly) to
@@ -940,7 +974,6 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   if (getHeader().e_shentsize != sizeof(Elf_Shdr))
     return createError("invalid e_shentsize in ELF header: " +
                        Twine(getHeader().e_shentsize));
-
   const uint64_t FileSize = Buf.size();
   if (SectionTableOffset + sizeof(Elf_Shdr) > FileSize ||
       SectionTableOffset + (uintX_t)sizeof(Elf_Shdr) < SectionTableOffset)
@@ -956,7 +989,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   const Elf_Shdr *First =
       reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);
 
-  uintX_t NumSections = getHeader().e_shnum;
+  uintX_t NumSections = e_shnum;
   if (NumSections == 0)
     NumSections = First->sh_size;
 
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc314586..232f6be9b4c498 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {
 
   unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
   unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {
+    return (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF ||
+            ELF::SHN_XINDEX == e_phnum) &&
+           e_shoff != 0;
+  }
 };
 
 template <endianness Endianness>
diff --git a/llvm/test/Object/invalid.test b/llvm/test/Object/invalid.test
index 58ec3cbeadd192..2bf23b45cdbb8d 100644
--- a/llvm/test/Object/invalid.test
+++ b/llvm/test/Object/invalid.test
@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s
 
-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
 
 --- !ELF
 FileHeader:
@@ -575,7 +575,7 @@ Sections:
 # RUN: yaml2obj --docnum=26 %s -o %t26
 # RUN: not llvm-readobj -h %t26 2>&1 | FileCheck -DFILE=%t26 --check-prefix=INVALID-SEC-NUM2 %s
 
-# INVALID-SEC-NUM2: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
+# INVALID-SEC-NUM2: error: '[[FILE]]': invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
 
 --- !ELF
 FileHeader:
diff --git a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
index 6622db237026fa..8b49454f985785 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s
 
 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)
 # CHECK-NEXT:   StringTableSectionIndex: 65535
 
 ## The first section header should store the real section header count and
diff --git a/llvm/test/tools/llvm-readobj/ELF/file-headers.test b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
index 97ab9f092b2287..d2fbed1b756564 100644
--- a/llvm/test/tools/llvm-readobj/ELF/file-headers.test
+++ b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
-
-# INVALID-LLVM:      File: [[FILE]]
-# INVALID-LLVM-NEXT: Format: elf64-unknown
-# INVALID-LLVM-NEXT: Arch: unknown
-# INVALID-LLVM-NEXT: AddressSize: 64bit
-# INVALID-LLVM-NEXT: LoadName: <Not found>
-# INVALID-LLVM-NEXT: ElfHeader {
-# INVALID-LLVM-NEXT:   Ident {
-# INVALID-LLVM-NEXT:     Magic: (7F 45 4C 46)
-# INVALID-LLVM-NEXT:     Class: 64-bit (0x2)
-# INVALID-LLVM-NEXT:     DataEncoding: LittleEndian (0x1)
-# INVALID-LLVM-NEXT:     FileVersion: 1
-# INVALID-LLVM-NEXT:     OS/ABI: SystemV (0x0)
-# INVALID-LLVM-NEXT:     ABIVersion: 0
-# INVALID-LLVM-NEXT:     Unused: (00 00 00 00 00 00 00)
-# INVALID-LLVM-NEXT:   }
-# INVALID-LLVM-NEXT:   Type: Relocatable (0x1)
-# INVALID-LLVM-NEXT:   Machine: EM_NONE (0x0)
-# INVALID-LLVM-NEXT:   Version: 1
-# INVALID-LLVM-NEXT:   Entry: 0x0
-# INVALID-LLVM-NEXT:   ProgramHeaderOffset: 0x0
-# INVALID-LLVM-NEXT:   SectionHeaderOffset: 0x1000
-# INVALID-LLVM-NEXT:   Flags [ (0x0)
-# INVALID-LLVM-NEXT:   ]
-# INVALID-LLVM-NEXT:   HeaderSize: 64
-# INVALID-LLVM-NEXT:   ProgramHeaderEntrySize: 0
-# INVALID-LLVM-NEXT:   ProgramHeaderCount: 0
-# INVALID-LLVM-NEXT:   SectionHeaderEntrySize: 64
-# INVALID-LLVM-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
-# INVALID-LLVM-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
-# INVALID-LLVM-NEXT: }
-# INVALID-LLVM-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
-
-# INVALID-GNU:      ELF Header:
-# INVALID-GNU-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
-# INVALID-GNU-NEXT:   Class:                             ELF64
-# INVALID-GNU-NEXT:   Data:                              2's complement, little endian
-# INVALID-GNU-NEXT:   Version:                           1 (current)
-# INVALID-GNU-NEXT:   OS/ABI:                            UNIX - System V
-# INVALID-GNU-NEXT:   ABI Version:                       0
-# INVALID-GNU-NEXT:   Type:                              REL (Relocatable file)
-# INVALID-GNU-NEXT:   Machine:                           None
-# INVALID-GNU-NEXT:   Version:                           0x1
-# INVALID-GNU-NEXT:   Entry point address:               0x0
-# INVALID-GNU-NEXT:   Start of program headers:          0 (bytes into file)
-# INVALID-GNU-NEXT:   Start of section headers:          4096 (bytes into file)
-# INVALID-GNU-NEXT:   Flags:                             0x0
-# INVALID-GNU-NEXT:   Size of this header:               64 (bytes)
-# INVALID-GNU-NEXT:   Size of program headers:           0 (bytes)
-# INVALID-GNU-NEXT:   Number of program headers:         0
-# INVALID-GNU-NEXT:   Size of section headers:           64 (bytes)
-# INVALID-GNU-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
-# INVALID-GNU-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
-# INVALID-GNU-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
+
+# INVALID-LLVM-TC1:      File: [[FILE]]
+# INVALID-LLVM-TC1-NEXT: Format: elf64-unknown
+# INVALID-LLVM-TC1-NEXT: Arch: unknown
+# INVALID-LLVM-TC1-NEXT: AddressSize: 64bit
+# INVALID-LLVM-TC1-NEXT: LoadName: <Not found>
+# INVALID-LLVM-TC1-NEXT: ElfHeader {
+# INVALID-LLVM-TC1-NEXT:   Ident {
+# INVALID-LLVM-TC1-NEXT:     Magic: (7F 45 4C 46)
+# INVALID-LLVM-TC1-NEXT:     Class: 64-bit (0x2)
+# INVALID-LLVM-TC1-NEXT:     DataEncoding: LittleEndian (0x1)
+# INVALID-LLVM-TC1-NEXT:     FileVersion: 1
+# INVALID-LLVM-TC1-NEXT:     OS/ABI: SystemV (0x0)
+# INVALID-LLVM-TC1-NEXT:     ABIVersion: 0
+# INVALID-LLVM-TC1-NEXT:     Unused: (00 00 00 00 00 00 00)
+# INVALID-LLVM-TC1-NEXT:   }
+# INVALID-LLVM-TC1-NEXT:   Type: Relocatable (0x1)
+# INVALID-LLVM-TC1-NEXT:   Machine: EM_NONE (0x0)
+# INVALID-LLVM-TC1-NEXT:   Version: 1
+# INVALID-LLVM-TC1-NEXT:   Entry: 0x0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderOffset: 0x0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderOffset: 0x1000
+# INVALID-LLVM-TC1-NEXT:   Flags [ (0x0)
+# INVALID-LLVM-TC1-NEXT:   ]
+# INVALID-LLVM-TC1-NEXT:   HeaderSize: 64
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderEntrySize: 0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderCount: 0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderEntrySize: 64
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
+# INVALID-LLVM-TC1-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
+# INVALID-LLVM-TC1-NEXT: }
+# INVALID-LLVM-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-GNU-TC1:      ELF Header:
+# INVALID-GNU-TC1-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
+# INVALID-GNU-TC1-NEXT:   Class:                             ELF64
+# INVALID-GNU-TC1-NEXT:   Data:                              2's complement, little endian
+# INVALID-GNU-TC1-NEXT:   Version:                           1 (current)
+# INVALID-GNU-TC1-NEXT:   OS/ABI:                            UNIX - System V
+# INVALID-GNU-TC1-NEXT:   ABI Version:                       0
+# INVALID-GNU-TC1-NEXT:   Type:                              REL (Relocatable file)
+# INVALID-GNU-TC1-NEXT:   Machine:                           None
+# INVALID-GNU-TC1-NEXT:   Version:                           0x1
+# INVALID-GNU-TC1-NEXT:   Entry point address:               0x0
+# INVALID-GNU-TC1-NEXT:   Start of program headers:          0 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Start of section headers:          4096 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Flags:                             0x0
+# INVALID-GNU-TC1-NEXT:   Size of this header:               64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Size of program headers:           0 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of program headers:         0
+# INVALID-GNU-TC1-NEXT:   Size of section headers:           64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
+# INVALID-GNU-TC1-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
+# INVALID-GNU-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-LLVM-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
+# INVALID-GNU-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
 
 --- !ELF
 FileHeader:
@@ -222,14 +225,14 @@ Sections:
 ## Check we don't dump anything except the file header when the section header table can't be read.
 
 # RUN: not llvm-readobj -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
 
 ## Check what we print when e_shnum == 0, e_shstrndx == SHN_XINDEX and the section header table can't be read.
 
 # RUN: yaml2obj %s -DSHNUM=0 -DSHSTRNDX=0xffff --docnum=4 -o %t.invalid2
 # RUN: not llvm-readobj --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM-TC2
 # RUN: not llvm-readelf --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU-TC2
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6f..1cfa138d7a7ea7 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -3575,9 +3575,16 @@ static inline void printFields(formatted_raw_ostream &OS, StringRef Str1,
 template <class ELFT>
 static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
                                               StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shnum != 0)
-    return to_string(ElfHeader.e_shnum);
+  if (Obj.getHeader().e_shnum != 0) {
+    std::string Result;
+    if (Obj.getHeader().e_shnum != Obj.getShNum())
+      raw_string_ostream(Result)
+          << format("%x (%x)", static_cast<int>(Obj.getHeader().e_shnum),
+                    static_cast<int>(Obj.getShNum()));
+    else
+      raw_string_ostream(Result) << Obj.getHeader().e_shnum;
+    return Result;
+  }
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3595,9 +3602,10 @@ static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
 template <class ELFT>
 static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
                                                     StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shstrndx != SHN_XINDEX)
-    return to_string(ElfHeader.e_shstrndx);
+  auto strndx = Obj.getHeader().e_shstrndx;
+
+  if (strndx != SHN_XINDEX)
+    return to_string(strndx);
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3609,8 +3617,7 @@ static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
 
   if (ArrOrErr->empty())
     return "65535 (corrupt: out of range)";
-  return to_string(ElfHeader.e_shstrndx) + " (" +
-         to_string((*ArrOrErr)[0].sh_link) + ")";
+  return to_string(strndx) + " (" + to_string(Obj.getShStrNdx()) + ")";
 }
 
 static const EnumEntry<unsigned> *getObjectFileEnumEntry(unsigned Type) {
@@ -3765,7 +3772,7 @@ template <class ELFT> void GNUELFDumper<ELFT>::printFileHeaders() {
   printFields(OS, "Size of this header:", Str);
   Str = to_string(e.e_phentsize) + " (bytes)";
   printFields(OS, "Size of program headers:", Str);
- ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2025

@llvm/pr-subscribers-lld-elf

Author: None (aokblast)

Changes

In ELF file, there is a possible extended header for those phnum, shnum, and shstrndx larger than the maximum of 16 bits. This extended header use section 0 to record these fields in 32 bits. For most of the ELF writers like lld, we already have the mechanism to synthesis this special section 0. However, the parser part don't have such infra and therefore we add it.

Also, we modify some test cases. For those expected-error test cases, their error emission get early. For those expected-correct test cases, we modify the output since we support more than 65535 sections now.


Patch is 21.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162288.diff

8 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (-1)
  • (modified) lld/ELF/Writer.cpp (+10-1)
  • (modified) llvm/include/llvm/Object/ELF.h (+43-10)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+5)
  • (modified) llvm/test/Object/invalid.test (+2-2)
  • (modified) llvm/test/tools/llvm-objcopy/ELF/many-sections.test (+1-1)
  • (modified) llvm/test/tools/llvm-readobj/ELF/file-headers.test (+63-60)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+19-13)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index bbf4b29a9fda5..71294782d9a31 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -4428,7 +4428,6 @@ void elf::writeEhdr(Ctx &ctx, uint8_t *buf, Partition &part) {
   eHdr->e_version = EV_CURRENT;
   eHdr->e_flags = ctx.arg.eflags;
   eHdr->e_ehsize = sizeof(typename ELFT::Ehdr);
-  eHdr->e_phnum = part.phdrs.size();
   eHdr->e_shentsize = sizeof(typename ELFT::Shdr);
 
   if (!ctx.arg.relocatable) {
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 4fa80397cbfa7..713d8279aab54 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2907,7 +2907,8 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
   // the value. The sentinel values and fields are:
   // e_shnum = 0, SHdrs[0].sh_size = number of sections.
   // e_shstrndx = SHN_XINDEX, SHdrs[0].sh_link = .shstrtab section index.
-  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_shoff);
+  // e_phnum = 0xFFFF, SHdrs[0]
+  auto *sHdrs = reinterpret_cast<Elf_Shdr *>(ctx.bufferStart + eHdr->e_smhoff);
   size_t num = ctx.outputSections.size() + 1;
   if (num >= SHN_LORESERVE)
     sHdrs->sh_size = num;
@@ -2922,6 +2923,14 @@ template <class ELFT> void Writer<ELFT>::writeHeader() {
     eHdr->e_shstrndx = strTabIndex;
   }
 
+  num = part.phdrs.size();
+  if (num >= 0xFFFF) {
+    eHdr->e_phnum = 0xFFFF;
+    sHdrs->sh_info = num;
+  } else {
+    eHdr->e_phnum = num;
+  }
+
   for (OutputSection *sec : ctx.outputSections)
     sec->writeHeaderTo<ELFT>(++sHdrs);
 }
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 59f63eb6b5bb6..4374924371d1c 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -278,9 +278,17 @@ class ELFFile {
   std::vector<Elf_Shdr> FakeSections;
   SmallString<0> FakeSectionStrings;
 
+  // Handle extended header in section 0
+  Elf_Word e_phnum;
+  Elf_Word e_shnum;
+  Elf_Word e_shstrndx;
+
   ELFFile(StringRef Object);
 
 public:
+  const Elf_Word getPhNum() const { return e_phnum; }
+  const Elf_Word getShNum() const { return e_shnum; }
+  const Elf_Word getShStrNdx() const { return e_shstrndx; }
   const Elf_Ehdr &getHeader() const {
     return *reinterpret_cast<const Elf_Ehdr *>(base());
   }
@@ -379,22 +387,21 @@ class ELFFile {
 
   /// Iterate over program header table.
   Expected<Elf_Phdr_Range> program_headers() const {
-    if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
+    if (e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
       return createError("invalid e_phentsize: " +
                          Twine(getHeader().e_phentsize));
 
-    uint64_t HeadersSize =
-        (uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
+    uint64_t HeadersSize = (uint64_t)e_phnum * getHeader().e_phentsize;
     uint64_t PhOff = getHeader().e_phoff;
     if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
       return createError("program headers are longer than binary of size " +
                          Twine(getBufSize()) + ": e_phoff = 0x" +
                          Twine::utohexstr(getHeader().e_phoff) +
-                         ", e_phnum = " + Twine(getHeader().e_phnum) +
+                         ", e_phnum = " + Twine(e_phnum) +
                          ", e_phentsize = " + Twine(getHeader().e_phentsize));
 
     auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
-    return ArrayRef(Begin, Begin + getHeader().e_phnum);
+    return ArrayRef(Begin, Begin + e_phnum);
   }
 
   /// Get an iterator over notes in a program header.
@@ -772,7 +779,7 @@ template <class ELFT>
 Expected<StringRef>
 ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
                                      WarningHandler WarnHandler) const {
-  uint32_t Index = getHeader().e_shstrndx;
+  uint32_t Index = e_shstrndx;
   if (Index == ELF::SHN_XINDEX) {
     // If the section name string table section index is greater than
     // or equal to SHN_LORESERVE, then the actual index of the section name
@@ -889,7 +896,12 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
   return 0;
 }
 
-template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
+template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
+  auto Header = getHeader();
+  e_phnum = Header.e_phnum;
+  e_shnum = Header.e_shnum;
+  e_shstrndx = Header.e_shstrndx;
+}
 
 template <class ELFT>
 Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
@@ -897,7 +909,29 @@ Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
     return createError("invalid buffer: the size (" + Twine(Object.size()) +
                        ") is smaller than an ELF header (" +
                        Twine(sizeof(Elf_Ehdr)) + ")");
-  return ELFFile(Object);
+  ELFFile Result(Object);
+
+  //
+  // sections() parse the total number of sections by considering the
+  // extended headers
+  //
+  if (Result.getHeader().HasHeaderExtension()) {
+    auto TableOrErr = Result.sections();
+    if (!TableOrErr)
+      return TableOrErr.takeError();
+    if ((*TableOrErr).size() == 0)
+      return Result;
+    auto SecOrErr = object::getSection<ELFT>(*TableOrErr, 0);
+    if (!SecOrErr)
+      return SecOrErr.takeError();
+    if (Result.e_phnum == 0xFFFF)
+      Result.e_phnum = (*SecOrErr)->sh_info;
+    if (Result.e_shnum == ELF::SHN_UNDEF)
+      Result.e_shnum = (*SecOrErr)->sh_size;
+    if (Result.e_shstrndx == ELF::SHN_XINDEX)
+      Result.e_shstrndx = (*SecOrErr)->sh_link;
+  }
+  return Result;
 }
 
 /// Used by llvm-objdump -d (which needs sections for disassembly) to
@@ -940,7 +974,6 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   if (getHeader().e_shentsize != sizeof(Elf_Shdr))
     return createError("invalid e_shentsize in ELF header: " +
                        Twine(getHeader().e_shentsize));
-
   const uint64_t FileSize = Buf.size();
   if (SectionTableOffset + sizeof(Elf_Shdr) > FileSize ||
       SectionTableOffset + (uintX_t)sizeof(Elf_Shdr) < SectionTableOffset)
@@ -956,7 +989,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
   const Elf_Shdr *First =
       reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);
 
-  uintX_t NumSections = getHeader().e_shnum;
+  uintX_t NumSections = e_shnum;
   if (NumSections == 0)
     NumSections = First->sh_size;
 
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc31458..232f6be9b4c49 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {
 
   unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
   unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
+  bool HasHeaderExtension() const {
+    return (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF ||
+            ELF::SHN_XINDEX == e_phnum) &&
+           e_shoff != 0;
+  }
 };
 
 template <endianness Endianness>
diff --git a/llvm/test/Object/invalid.test b/llvm/test/Object/invalid.test
index 58ec3cbeadd19..2bf23b45cdbb8 100644
--- a/llvm/test/Object/invalid.test
+++ b/llvm/test/Object/invalid.test
@@ -556,7 +556,7 @@ Sections:
 # RUN: yaml2obj --docnum=25 %s -o %t25
 # RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s
 
-# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
+# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
 
 --- !ELF
 FileHeader:
@@ -575,7 +575,7 @@ Sections:
 # RUN: yaml2obj --docnum=26 %s -o %t26
 # RUN: not llvm-readobj -h %t26 2>&1 | FileCheck -DFILE=%t26 --check-prefix=INVALID-SEC-NUM2 %s
 
-# INVALID-SEC-NUM2: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
+# INVALID-SEC-NUM2: error: '[[FILE]]': invalid number of sections specified in the NULL section's sh_size field (288230376151711744)
 
 --- !ELF
 FileHeader:
diff --git a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
index 6622db237026f..8b49454f98578 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/many-sections.test
@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
 RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s
 
 ## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
-# CHECK:        SectionHeaderCount: 0
+# CHECK:        SectionHeaderCount: 0 (65540)
 # CHECK-NEXT:   StringTableSectionIndex: 65535
 
 ## The first section header should store the real section header count and
diff --git a/llvm/test/tools/llvm-readobj/ELF/file-headers.test b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
index 97ab9f092b228..d2fbed1b75656 100644
--- a/llvm/test/tools/llvm-readobj/ELF/file-headers.test
+++ b/llvm/test/tools/llvm-readobj/ELF/file-headers.test
@@ -143,64 +143,67 @@ FileHeader:
 # RUN: yaml2obj %s --docnum=4 -o %t.invalid1
 # RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf --file-headers %t.invalid1 2>&1 \
 # RUN:  | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
-# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
-
-# INVALID-LLVM:      File: [[FILE]]
-# INVALID-LLVM-NEXT: Format: elf64-unknown
-# INVALID-LLVM-NEXT: Arch: unknown
-# INVALID-LLVM-NEXT: AddressSize: 64bit
-# INVALID-LLVM-NEXT: LoadName: <Not found>
-# INVALID-LLVM-NEXT: ElfHeader {
-# INVALID-LLVM-NEXT:   Ident {
-# INVALID-LLVM-NEXT:     Magic: (7F 45 4C 46)
-# INVALID-LLVM-NEXT:     Class: 64-bit (0x2)
-# INVALID-LLVM-NEXT:     DataEncoding: LittleEndian (0x1)
-# INVALID-LLVM-NEXT:     FileVersion: 1
-# INVALID-LLVM-NEXT:     OS/ABI: SystemV (0x0)
-# INVALID-LLVM-NEXT:     ABIVersion: 0
-# INVALID-LLVM-NEXT:     Unused: (00 00 00 00 00 00 00)
-# INVALID-LLVM-NEXT:   }
-# INVALID-LLVM-NEXT:   Type: Relocatable (0x1)
-# INVALID-LLVM-NEXT:   Machine: EM_NONE (0x0)
-# INVALID-LLVM-NEXT:   Version: 1
-# INVALID-LLVM-NEXT:   Entry: 0x0
-# INVALID-LLVM-NEXT:   ProgramHeaderOffset: 0x0
-# INVALID-LLVM-NEXT:   SectionHeaderOffset: 0x1000
-# INVALID-LLVM-NEXT:   Flags [ (0x0)
-# INVALID-LLVM-NEXT:   ]
-# INVALID-LLVM-NEXT:   HeaderSize: 64
-# INVALID-LLVM-NEXT:   ProgramHeaderEntrySize: 0
-# INVALID-LLVM-NEXT:   ProgramHeaderCount: 0
-# INVALID-LLVM-NEXT:   SectionHeaderEntrySize: 64
-# INVALID-LLVM-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
-# INVALID-LLVM-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
-# INVALID-LLVM-NEXT: }
-# INVALID-LLVM-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
-
-# INVALID-GNU:      ELF Header:
-# INVALID-GNU-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
-# INVALID-GNU-NEXT:   Class:                             ELF64
-# INVALID-GNU-NEXT:   Data:                              2's complement, little endian
-# INVALID-GNU-NEXT:   Version:                           1 (current)
-# INVALID-GNU-NEXT:   OS/ABI:                            UNIX - System V
-# INVALID-GNU-NEXT:   ABI Version:                       0
-# INVALID-GNU-NEXT:   Type:                              REL (Relocatable file)
-# INVALID-GNU-NEXT:   Machine:                           None
-# INVALID-GNU-NEXT:   Version:                           0x1
-# INVALID-GNU-NEXT:   Entry point address:               0x0
-# INVALID-GNU-NEXT:   Start of program headers:          0 (bytes into file)
-# INVALID-GNU-NEXT:   Start of section headers:          4096 (bytes into file)
-# INVALID-GNU-NEXT:   Flags:                             0x0
-# INVALID-GNU-NEXT:   Size of this header:               64 (bytes)
-# INVALID-GNU-NEXT:   Size of program headers:           0 (bytes)
-# INVALID-GNU-NEXT:   Number of program headers:         0
-# INVALID-GNU-NEXT:   Size of section headers:           64 (bytes)
-# INVALID-GNU-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
-# INVALID-GNU-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
-# INVALID-GNU-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+# RUN:    -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
+
+# INVALID-LLVM-TC1:      File: [[FILE]]
+# INVALID-LLVM-TC1-NEXT: Format: elf64-unknown
+# INVALID-LLVM-TC1-NEXT: Arch: unknown
+# INVALID-LLVM-TC1-NEXT: AddressSize: 64bit
+# INVALID-LLVM-TC1-NEXT: LoadName: <Not found>
+# INVALID-LLVM-TC1-NEXT: ElfHeader {
+# INVALID-LLVM-TC1-NEXT:   Ident {
+# INVALID-LLVM-TC1-NEXT:     Magic: (7F 45 4C 46)
+# INVALID-LLVM-TC1-NEXT:     Class: 64-bit (0x2)
+# INVALID-LLVM-TC1-NEXT:     DataEncoding: LittleEndian (0x1)
+# INVALID-LLVM-TC1-NEXT:     FileVersion: 1
+# INVALID-LLVM-TC1-NEXT:     OS/ABI: SystemV (0x0)
+# INVALID-LLVM-TC1-NEXT:     ABIVersion: 0
+# INVALID-LLVM-TC1-NEXT:     Unused: (00 00 00 00 00 00 00)
+# INVALID-LLVM-TC1-NEXT:   }
+# INVALID-LLVM-TC1-NEXT:   Type: Relocatable (0x1)
+# INVALID-LLVM-TC1-NEXT:   Machine: EM_NONE (0x0)
+# INVALID-LLVM-TC1-NEXT:   Version: 1
+# INVALID-LLVM-TC1-NEXT:   Entry: 0x0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderOffset: 0x0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderOffset: 0x1000
+# INVALID-LLVM-TC1-NEXT:   Flags [ (0x0)
+# INVALID-LLVM-TC1-NEXT:   ]
+# INVALID-LLVM-TC1-NEXT:   HeaderSize: 64
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderEntrySize: 0
+# INVALID-LLVM-TC1-NEXT:   ProgramHeaderCount: 0
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderEntrySize: 64
+# INVALID-LLVM-TC1-NEXT:   SectionHeaderCount: [[SECHDRCOUNT]]
+# INVALID-LLVM-TC1-NEXT:   StringTableSectionIndex: [[SECHDRSTRTABINDEX]]
+# INVALID-LLVM-TC1-NEXT: }
+# INVALID-LLVM-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-GNU-TC1:      ELF Header:
+# INVALID-GNU-TC1-NEXT:   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
+# INVALID-GNU-TC1-NEXT:   Class:                             ELF64
+# INVALID-GNU-TC1-NEXT:   Data:                              2's complement, little endian
+# INVALID-GNU-TC1-NEXT:   Version:                           1 (current)
+# INVALID-GNU-TC1-NEXT:   OS/ABI:                            UNIX - System V
+# INVALID-GNU-TC1-NEXT:   ABI Version:                       0
+# INVALID-GNU-TC1-NEXT:   Type:                              REL (Relocatable file)
+# INVALID-GNU-TC1-NEXT:   Machine:                           None
+# INVALID-GNU-TC1-NEXT:   Version:                           0x1
+# INVALID-GNU-TC1-NEXT:   Entry point address:               0x0
+# INVALID-GNU-TC1-NEXT:   Start of program headers:          0 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Start of section headers:          4096 (bytes into file)
+# INVALID-GNU-TC1-NEXT:   Flags:                             0x0
+# INVALID-GNU-TC1-NEXT:   Size of this header:               64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Size of program headers:           0 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of program headers:         0
+# INVALID-GNU-TC1-NEXT:   Size of section headers:           64 (bytes)
+# INVALID-GNU-TC1-NEXT:   Number of section headers:         [[SECHDRCOUNT]]
+# INVALID-GNU-TC1-NEXT:   Section header string table index: [[SECHDRSTRTABINDEX]]
+# INVALID-GNU-TC1-NEXT: error: '[[FILE]]': unable to continue dumping, the file is corrupt: section header table goes past the end of the file: e_shoff = 0x1000
+
+# INVALID-LLVM-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
+# INVALID-GNU-TC2: error: '[[FILE]]': section header table goes past the end of the file: e_shoff = 0x1000
 
 --- !ELF
 FileHeader:
@@ -222,14 +225,14 @@ Sections:
 ## Check we don't dump anything except the file header when the section header table can't be read.
 
 # RUN: not llvm-readobj -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
 # RUN: not llvm-readelf -a %t.invalid1 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid1 -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-GNU-TC1
 
 ## Check what we print when e_shnum == 0, e_shstrndx == SHN_XINDEX and the section header table can't be read.
 
 # RUN: yaml2obj %s -DSHNUM=0 -DSHSTRNDX=0xffff --docnum=4 -o %t.invalid2
 # RUN: not llvm-readobj --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-LLVM-TC2
 # RUN: not llvm-readelf --file-headers %t.invalid2 2>&1 \
-# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU
+# RUN:  | FileCheck %s -DFILE=%t.invalid2 -DSECHDRCOUNT="<?>" -DSECHDRSTRTABINDEX="<?>" --check-prefix=INVALID-GNU-TC2
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6..1cfa138d7a7ea 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -3575,9 +3575,16 @@ static inline void printFields(formatted_raw_ostream &OS, StringRef Str1,
 template <class ELFT>
 static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
                                               StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shnum != 0)
-    return to_string(ElfHeader.e_shnum);
+  if (Obj.getHeader().e_shnum != 0) {
+    std::string Result;
+    if (Obj.getHeader().e_shnum != Obj.getShNum())
+      raw_string_ostream(Result)
+          << format("%x (%x)", static_cast<int>(Obj.getHeader().e_shnum),
+                    static_cast<int>(Obj.getShNum()));
+    else
+      raw_string_ostream(Result) << Obj.getHeader().e_shnum;
+    return Result;
+  }
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3595,9 +3602,10 @@ static std::string getSectionHeadersNumString(const ELFFile<ELFT> &Obj,
 template <class ELFT>
 static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
                                                     StringRef FileName) {
-  const typename ELFT::Ehdr &ElfHeader = Obj.getHeader();
-  if (ElfHeader.e_shstrndx != SHN_XINDEX)
-    return to_string(ElfHeader.e_shstrndx);
+  auto strndx = Obj.getHeader().e_shstrndx;
+
+  if (strndx != SHN_XINDEX)
+    return to_string(strndx);
 
   Expected<ArrayRef<typename ELFT::Shdr>> ArrOrErr = Obj.sections();
   if (!ArrOrErr) {
@@ -3609,8 +3617,7 @@ static std::string getSectionHeaderTableIndexString(const ELFFile<ELFT> &Obj,
 
   if (ArrOrErr->empty())
     return "65535 (corrupt: out of range)";
-  return to_string(ElfHeader.e_shstrndx) + " (" +
-         to_string((*ArrOrErr)[0].sh_link) + ")";
+  return to_string(strndx) + " (" + to_string(Obj.getShStrNdx()) + ")";
 }
 
 static const EnumEntry<unsigned> *getObjectFileEnumEntry(unsigned Type) {
@@ -3765,7 +3772,7 @@ template <class ELFT> void GNUELFDumper<ELFT>::printFileHeaders() {
   printFields(OS, "Size of this header:", Str);
   Str = to_string(e.e_phentsize) + " (bytes)";
   printFields(OS, "Size of program headers:", Str);
-  Str = to_string...
[truncated]

@jh7370 jh7370 requested a review from MaskRay October 7, 2025 14:55
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing. Please fix it.

I've not looked at the change or the classes in depth, but the code already does try to handle large section counts, for example (see e.g. line 993 in ELF.h that you're trying to modify). What exactly are you trying to achieve that this code doesn't already do?

@aokblast
Copy link
Contributor Author

aokblast commented Oct 7, 2025

The build is failing. Please fix it.

I've not looked at the change or the classes in depth, but the code already does try to handle large section counts, for example (see e.g. line 993 in ELF.h that you're trying to modify). What exactly are you trying to achieve that this code doesn't already do?

Thanks for your reply! I forget to enable lld. I am currently compiling and running test.

Some program, like readelf, currently unable to display 65535+ program headers. I think it is better for us to support this from our Support/Object level. Also, without doing this in Support/Object level, the program_headers() in Object/ELF.h also cannot iterate over 65535+ segments, which many programs rely on this to iterate over all program headers without reading the size themselve.

For section, I just handle them together with segment so it would be more consistent. Line 993 should be removed but cannot since when we use getSection() first time, we need to know the correct number of section in section() so that getSection(0) won't complain about OutOfBound.

@jh7370
Copy link
Collaborator

jh7370 commented Oct 7, 2025

Why do you want to support so many program headers? Just because the spec allows us doesn't mean we need to support it, if there's no use case for it.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 7, 2025

Why do you want to support so many program headers? Just because the spec allows us doesn't mean we need to support it, if there's no use case for it.

It is related to the coredump design in FreeBSD. See: #132216. FreeBSD regards a mmap as a segment in coredump. Therefore, a program has 65535 more mmaps actually uses ExtendedHeader to store the actual number of mmaps and its address.
I am not sure if they have CI to check all segment mapping (phdrs), but I think it is possible for them to do so as we have test-cases to check 65535+ sections and the section headers.
If you think that it is more proper to fix in readelf itself, I can do it since the CI that the issue reported only compare the sections number insteead of their address. I just think that it would be great if we can do "check ExtendedHeader exists -> parse sections -> check section0 exists -> get correct section and segment info" in Support/Object instead of everybody has their own copies. But I am not insist on that.

@MaskRay
Copy link
Member

MaskRay commented Oct 8, 2025

Please remove the lld/ELF change. For linker output we don't intend to support the PN_XNUM program header feature. We also don't accept features without a test.

Link: https://groups.google.com/g/generic-abi/c/-J3lNY8ZKkU ("PN_XNUM extension for program headers")

Suggested title: [Object,ELF] Impelment PN_XNUM extension for program headers

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I agree that there's value in having the program header count extension in the parsing and dumping code.

In addition to removing the lld changes, I think this change warrants being split into at leasr two PRs, one (or more) which does the refactoring (if needed) such that both extended program headers and section headers can be handled all together and another that actually implements the program header functionality. You will also probably want a separate change specifically for llvm-objcopy, so that it can write objects containing many program headers.

@@ -278,9 +278,17 @@ class ELFFile {
std::vector<Elf_Shdr> FakeSections;
SmallString<0> FakeSectionStrings;

// Handle extended header in section 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't really add anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

Comment on lines 282 to 284
Elf_Word e_phnum;
Elf_Word e_shnum;
Elf_Word e_shstrndx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these values aren't the raw values in the header, let's not confuse things by naming them after those fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -889,15 +896,42 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
return 0;
}

template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
auto Header = getHeader();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the LLVM style guide about using auto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {

unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
bool HasHeaderExtension() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the LLVM style guide regarding naming conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

Copy link
Member

@MaskRay MaskRay Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header isn't clear - it could mean elf header, program header, or section header.

perhaps hasPhdrNumExtension

0xFFFF => PN_XNUM, a new constant in include/llvm/BinaryFormat/ELF.h

The condition (e_phnum == 0xFFFF || e_shnum == ELF::SHN_UNDEF || ELF::SHN_XINDEX == e_phnum) && e_shoff != 0; doesn't look correct.

Copy link
Contributor Author

@aokblast aokblast Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello the last issue have been fixed locally. I am still figuring out problems in objcopy so I might need some time to upload it.

ELFFile Result(Object);

//
// sections() parse the total number of sections by considering the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem to make sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

ELFFile(StringRef Object);

public:
const Elf_Word getPhNum() const { return e_phnum; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const on a value return type doesn't make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -772,7 +779,7 @@ template <class ELFT>
Expected<StringRef>
ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
WarningHandler WarnHandler) const {
uint32_t Index = getHeader().e_shstrndx;
uint32_t Index = e_shstrndx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes like this seem a bit muddled to me, in that I'd expect e_shstrndx to have been resolved to the real index as part of constructing ELFFile, meaning you don't then need to compare it against SHN_XINDEX below.

It suggests to me that either you've got code left around from earlier versions of the change, or you aren't resolving the real values early enough. I don't want to end up in a situation where people will have to study the code to know whether e_shstrndx etc needs to be checked against SHN_XINDEX. This may mean avoiding calls to certain functions during the ELFFile constructor, at least until this resolution has taken place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it is my bad. I forget to delete the following if condition throughtly. I thought that we cannot call specific function like the ones with Expected<> return type in constructor originally. All other parts is safe after the assignment in create().

@@ -556,7 +556,7 @@ Sections:
# RUN: yaml2obj --docnum=25 %s -o %t25
# RUN: not llvm-readobj -h %t25 2>&1 | FileCheck -DFILE=%t25 --check-prefix=INVALID-SEC-NUM1 %s

# INVALID-SEC-NUM1: error: '[[FILE]]': unable to continue dumping, the file is corrupt: invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
# INVALID-SEC-NUM1: error: '[[FILE]]': invalid section header table offset (e_shoff = 0x58) or invalid number of sections specified in the first section header's sh_size field (0x3ffffffffffffff)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to preserve the status quo, if possible, regarding the error messages. The refactoring should be entirely non-functional from an end user perspective.

Copy link
Contributor Author

@aokblast aokblast Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I am a little bit uncertain about this. Now, the ELFFile::section() error can be detected in the initialization step of ELFFile since we actually try to read it. Therefore, it makes some error happen early like this case since this error was happen in the middle when dumping instead of constructing ELFFile. Should we ignore silently in construction? Or should we report it right away? If we ignore it, we can preserve the original behavior and allow later step to handle it but we actually create an incorrect ELFFile. If we want to emit the error in ELFFile, we should do some modification like here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lazily populate the information?

One of the issues we've had in the past with llvm-readobj and other dumping tools is it would error out early because it couldn't read some part of the file, long before it actually needed to display the relevant file information, which meant in turn it became hard to impossible to diagnose the problem when a file was malformed in some way. In practice, when we dump the file headers, whether the entire section header table is readable is irrelevant, for example. We should only need to read the 0th section header to do certain things. I hope that makes sense.

@@ -6,7 +6,7 @@ RUN: llvm-readobj --file-headers --sections --symbols %t2 | FileCheck %s
RUN: llvm-readelf --symbols %t2 | FileCheck --check-prefix=SYMS %s

## The ELF header should have e_shnum == 0 and e_shstrndx == SHN_XINDEX.
# CHECK: SectionHeaderCount: 0
# CHECK: SectionHeaderCount: 0 (65540)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change actually related to the code changes you've made, or is it just tightening up the test and is already the current behaviour? If the latter, please put it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is part of the readelf (readobj). I would put to seperated PR. Thanks!

@@ -143,64 +143,67 @@ FileHeader:
# RUN: yaml2obj %s --docnum=4 -o %t.invalid1
# RUN: not llvm-readobj --file-headers %t.invalid1 2>&1 \
# RUN: | FileCheck %s --implicit-check-not=warning: -DFILE=%t.invalid1 \
# RUN: -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM
# RUN: -DSECHDRCOUNT=8192 -DSECHDRSTRTABINDEX=12288 --check-prefix=INVALID-LLVM-TC1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-TC1 is a meaningless suffix. Instead, rename the whole prefix to better describe the specific case that you're interested in, e.g. something like BAD-SHOFF-LLVM. I'd also add a comment at the start of each test case explaining what the case is testing.

In ELF file, there is a possible extended header for those phnum, shnum,
and shstrndx larger than the maximum of 16 bits. This extended header
use section 0 to record these fields in 32 bits.  We implment this
feature so that programs rely on ELFFile::program_headers() can get the
correct number of segments. Also, the consumers don't have to check the
section 0 themselve, insteead, they can use the getPhNum() as an
alternative.
@aokblast
Copy link
Contributor Author

aokblast commented Oct 9, 2025

It seems that github does not have features that can submit patches based on another branch on my local tree. SO I send another PR with two commits. #162648

BTW, should I provide tests for this patch while the readobj patch have already tested it?

Also, thanks for you two to help me review my patch!

@aokblast aokblast changed the title [Object][ELF] Support extended header for Object Parser in ELF [Object,ELF] Impelment PN_XNUM extension for program headers Oct 10, 2025
@aokblast aokblast changed the title [Object,ELF] Impelment PN_XNUM extension for program headers [Object,ELF] Implement PN_XNUM extension for program headers Oct 16, 2025
@aokblast aokblast requested review from MaskRay and jh7370 October 16, 2025 13:09
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid force pushing to branches under active review, per LLVM policy, as it's unnecessary and means it's no longer possible to review changes to the PR since the previous review. You can use a merge commit, if you want to rebase, or fixup commits to make additional changes in the PR. All commits will be squashed into a single commit when landing the PR, because LLVM uses the "Sqaush & Merge" approach in GitHub. See https://llvm.org/docs/GitHub.html for more details.

@@ -278,9 +278,16 @@ class ELFFile {
std::vector<Elf_Shdr> FakeSections;
SmallString<0> FakeSectionStrings;

Elf_Word RealPhNum;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a comment immediately before this block of variables explaining why they are needed and Header.e_phnum etc can't be relied upon directly?

Also, as these are no longer referring directly to a specific field in the ELF (since they could be the value in the ELF header or the 0th section header), I'd stop using Elf_* types for them, perhaps using size_t instead, since by this point they are more about being indexes and counts of things in memory/in a file.

@@ -379,22 +386,21 @@ class ELFFile {

/// Iterate over program header table.
Expected<Elf_Phdr_Range> program_headers() const {
if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
if (RealPhNum && getHeader().e_phentsize != sizeof(Elf_Phdr))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're introducing a getter, perhaps it makes sense to always use the getter, even within the class?

Index = Sections[0].sh_link;
}
uint32_t Index = RealShStrNdx;
if (Index == ELF::SHN_XINDEX)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bug here: if there happen to be exactly SHN_XINDEX section headers, this check will spuriously report an error, because the sh_size field of the null section header will happen to be 0xffff.

if (!Header.hasPhdrNumExtension())
return;

// An ELF binary may report `hasExtendedHeader` as true but not actually
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasExtendedHeader is stale, I think?

More generally, this is an example of why I think it makes more sense to lazily initialise the Real* values. By lazily initialising, we don't have to wonder whether these values have actually been set properly or not and therefore whether we need to look at section 0 again. It would also help with the other case I've commented on: if we make Real* values std::optional, we can detect whether they've been set or not, then once they've been set, we can rely on these values always being the real values.

Copy link
Contributor Author

@aokblast aokblast Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use std::optional for the new getter?
Or just returns Real_* ? Real_ : getHeader()->e_*.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the getter should be the thing that does the initialization (if needed).

So something like:

mutable std::optional<uint32_t> RealPhNum;
mutable std::optional<uint32_t> RealShNum;
mutable std::optional<uint32_t> RealShStrndx;

Expected<uint32_t> getPhNum() const { // Probably should be defined in a .cpp at this point.
  if (!RealPhNum)
    if (Error E = readShdrZero())
      return std::move(E);
  return *RealPhNum;
}
// Same for ShNum and ShStrndx.

Error readShdrZero() {
  const Elf_Ehdr &Header = getHeader();
  if (Header.e_phnum == PN_XNUM || Header.e_shnum == 0 || Header.e_shstrndx == SHN_XINDEX)
  auto SecOrErr = getSection(0); // NB: maybe should use Expected<...> here instead, to fit LLVM coding standards.
  if (!SecOrErr)
    return SecOrErr.takeError();

  RealPhNum = Header.e_phnum == PN_XNUM ? (*SecOrErr)->sh_info : Header.e_phnum;
  RealShNum = Header.e_shnum == 0 ? (*SecOrErrr)->sh_size : Header.e_shnum;
  RealShStrndx = Header.e_shstrndx == SHN_XINDEX ? (*SecOrErr)->sh_link : Header.e_shstrndx;
}

Miscellaneous notes about the above:

  • The above is not likely to be 100% correct, as I haven't tested it, but should help clarify what I'm expecting.
  • The Elf gABI says e_shnum == 0 indicates to use the 0th section header sh_size field, not SHN_UNDEF. While the two technically are the same value, it's more correct to compare against the literal 0 and not SHN_UNDEF.
  • I've used PN_XNUM, not 0xffff, since we should use the constants when we have them and the gABI does (NB: the draft gABI hasn't been updated with the PN_XNUM extension yet, but it has been agreed on the mailing list).
  • By doing this, the error isn't reported until the first (and subsequent) attempts to read the the real values, so if the values aren't needed, no error is reported, and the error is only reported once it is actually needed. If multiple reports of the error are undesirable, the caller can handle it (see e.g. llvm-readelf's reportUniqueWarning).
  • If doing this, it's essential we don't reference Real... directly outside the example code I've posted, even if it's accessible. You could go as far as wrapping the fields and initialisation in another class, to enforce this, but I don't think it's worth doing that, personally.

Copy link
Contributor Author

@aokblast aokblast Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I get what you mean.

A question is that should we consume up the error or report it back when the caller is inside the class?

Since some member functions like program_headers do not expect that it fails on reading e_phnum. That means we need to change some testcase if we want to report it back since the user of program_headers would be expected to early return and early report of the error.

But you said that we don't want to see any difference in a user's viewpoint. That is why I choose to set these field in the constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we don't want to introduce a test with 0xffff program headers, we can create a test, adapted from test/llvm-objcopy/ELF/invalid-e_phoff.test, that shows the llvm-readelf behavior when e_phnum is 0xffff, but the section header table doesn't contain that many of sections. The diagnostic is probably different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question is that should we consume up the error or report it back when the caller is inside the class?

I'd propagate it as far as possible until you have to change an API, then consume the error with a TODO comment to highlight we'd like to propagate it further. We've taken this approach in other cases. A future change could be made to propagate it further, where we'd evaluate what should be done on a case-by-case basis.

As I understand it, with my suggested approach, the only time we'd see an error is if e_phnum is PN_XNUM/e_shstrndx is SHN_XINDEX/e_shnum is 0 and the section header table can't be read (or there are no section headers in it). As a result, this should preserve the existing behaviour for all current test cases.

Regarding testing, strictly-speaking, I expect the final gABI wording to match that of the Oracle document, which technically means the extension should only be used if there are more than 65534 program headers. However, fundamentally I don't see any reason we'd enforce this in the binary tools (possibly with the exception in llvm-objcopy when writing an object out that used the extension unnecessarily), so if we used yaml2obj to specify e_phnum to be PN_XNUM and to manually populate section header 0, we could show that it correctly reads this in one test and then in a different test case (or cases) we could have it fail because e.g. the section header table is missing (there's a yaml2obj feature that allows you to suppress the table).

Copy link
Contributor Author

@aokblast aokblast Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please check if the current change is what you expected? If that is the case, I will rebase my llvm-objcopy branch and force push just for once, since another branch is so dirty. Sorry for the bothering and nuance.

There are actually some corner test cases has the problem you have mentioned (SHN_UNDEF and the section size is actually 0). I just add some comments there.

return;

// An ELF binary may report `hasExtendedHeader` as true but not actually
// include an extended header. For example, a core dump can contain 65,535
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "extended header" isn't used in the ELF gABI, so I don't think we should use it here. The header hasn't really been extended, in my opinion, so I think the term used in the Oracle docs (and possibly elsewhere) is misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the better choice for us?

According to gabi, they don't provide special term for this condition. Also, extended header is used in LLDB which also provides mechanism to parse special section 0.

Should we call hasSection0Header? Or something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the code suggestion I've given, the hasPhdrNumExtension should be redundant. If a comment is needed within the code suggestion, I'd just refer to it as "the section header at index 0" or something to that effect. I expect you'll see other examples in the code or commit history.

SmallString<0> FakeSectionStrings;

//
// According to the ELF gABI, these three fields can be recorded in section 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the first // and last // lines. Suggested a separate comment for each variable

// When the number of program headers is >= 0xffff, the actual number is contained in the ... field of the section header at index 0.
std::optional<uint32_t> RealPhNum;
// When the number of section headers is >= 0xff00, the actual number is contained in the sh_size field ...
std::optional<uint32_t> RealShNum;
// ...

In ELF file, there is a possible extended header for those phnum, shnum,
and shstrndx larger than the maximum of 16 bits. This extended header
use section 0 to record these fields in 32 bits.  We implment this
feature so that programs rely on ELFFile::program_headers() can get the
correct number of segments. Also, the consumers don't have to check the
section 0 themselve, insteead, they can use the getPhNum() as an
alternative.
if ((*SecsOrErr).size() == 0) {
if (Header.e_phnum == ELF::PN_XNUM ||
Header.e_shstrndx == ELF::SHN_XINDEX) {
return createError("Unable to find Section 0");
Copy link
Contributor Author

@aokblast aokblast Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, some test fails on this new report.
See: https://github.com/llvm/llvm-project/actions/runs/18658967594/job/53194795927?pr=162288#step:3:12638.
Should we ignore silently so that it can report the error in the correct place or just modify the testcases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's a genuine failure, i.e. you've done something in this PR that is incorrect. For context, the failing test case is:

## Check the case when e_shstrndx == SHN_XINDEX, but null section's sh_link contains
## the index of a section header string table that is larger than the number of the sections.

# RUN: yaml2obj --docnum=30 %s -o %t30
# RUN: not llvm-objcopy %t30 2>&1 | FileCheck %s -DFILE=%t30 --check-prefix=INVALID-SHSTRTAB-INDEX

# INVALID-SHSTRTAB-INDEX: error: section header string table index 255 does not exist

--- !ELF
FileHeader:
  Class:     ELFCLASS64
  Data:      ELFDATA2LSB
  Type:      ET_REL
  Machine:   EM_X86_64
## SHN_XINDEX == 0xffff.
  EShStrNdx: 0xffff
Sections:
  - Type: SHT_NULL
    Link: 0xff

Section header 0 clearly should be readable in this case, but the test is failing with that unreadable section error you've introduced. The failure would come later on, when attempting to use the section header string table index - see

return createError("section header string table index " + Twine(Index) +
. (Aside: getSectionStringTable and similar functions should be modified in this PR to reference Real* and call this method if the values are unitialised and not reference e_shnum directly).

The obj2yaml test failure also looks genuine, probably for the same reason.

Copy link
Contributor Author

@aokblast aokblast Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realize that sh_size * sizeof(ELF_Shdr) is 64bit field. Even gnu-readelf reports 32bit.
That is why I think it is a new error when I tried to compare the result with gnu version.

Copy link
Collaborator

@jh7370 jh7370 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sh_link isn't 64-bit. It's an Elf32_Word or Elf64_Word, but those are both 32-bits (see https://gabi.xinuos.com/elf/03-sheader.html).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix comments.

if ((*SecsOrErr).size() == 0) {
if (Header.e_phnum == ELF::PN_XNUM ||
Header.e_shstrndx == ELF::SHN_XINDEX) {
return createError("Unable to find Section 0");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's a genuine failure, i.e. you've done something in this PR that is incorrect. For context, the failing test case is:

## Check the case when e_shstrndx == SHN_XINDEX, but null section's sh_link contains
## the index of a section header string table that is larger than the number of the sections.

# RUN: yaml2obj --docnum=30 %s -o %t30
# RUN: not llvm-objcopy %t30 2>&1 | FileCheck %s -DFILE=%t30 --check-prefix=INVALID-SHSTRTAB-INDEX

# INVALID-SHSTRTAB-INDEX: error: section header string table index 255 does not exist

--- !ELF
FileHeader:
  Class:     ELFCLASS64
  Data:      ELFDATA2LSB
  Type:      ET_REL
  Machine:   EM_X86_64
## SHN_XINDEX == 0xffff.
  EShStrNdx: 0xffff
Sections:
  - Type: SHT_NULL
    Link: 0xff

Section header 0 clearly should be readable in this case, but the test is failing with that unreadable section error you've introduced. The failure would come later on, when attempting to use the section header string table index - see

return createError("section header string table index " + Twine(Index) +
. (Aside: getSectionStringTable and similar functions should be modified in this PR to reference Real* and call this method if the values are unitialised and not reference e_shnum directly).

The obj2yaml test failure also looks genuine, probably for the same reason.


// Pretend we have section 0 or sections() would call getShNum and thus
// become an infinite recursion
RealShNum = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This code should be guarded by an e_shnum == 0 check. If e_shnum != 0, we don't need to do funky things to read the index 0 header, like pre-filling RealShNum with a fake value, and can instead just set RealShNum to the right value up-front prior to calling sections().
  2. Perhaps this should be 1, not 0 and we should call getSection(0)? That would allow sections() (called by getSection(0) to read the first section header, if it is within the bounds of the file, or generate an appropriate error otherwise. Note that if there really are no section headers, getSection(0) will emit an error (see
    return createError("invalid section index: " + Twine(Index));
    ), indicating that section 0 doesn't exist, which is exactly what we want.
  3. sections() should not be referencing e_shnum directly. It should use RealShNum, which will either contain the real value taken directly from e_shnum (because it's non-zero), a previously calculated value (derived from an earlier call to this function), or the fake value we set prior to calling sections() for the first time (i.e. 1 - see point 2)).

Copy link
Contributor Author

@aokblast aokblast Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we actually have sh_num == 0, and both phnum and shstrndx are valid. It should not be considered as an Error. Right? Since the search of shdr 0 is caused by shnum (not phnum or shstrndx) but we actually have 0 sections. For this case, we should not regard it as an error.

Update: I got what you mean, just ignore this.

// Pretend we have section 0 or sections() would call getShNum and thus
// become an infinite recursion
RealShNum = 0;
auto SecsOrErr = sections();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above suggested points, you should be able to call getSection(0) here.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. This is looking mostly good. I've got a few remaining small comments.

We'll need to audit the existing code within LLVM, to ensure people aren't doing the section 0 reading themselves still, but those can probably be left until future PRs, if the tests are all passing.

return std::move(E);
}
return *RealPhNum;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: new lines between functions, please.

Index = Sections[0].sh_link;
Expected<uint32_t> ShStrNdxOrErr = getShStrNdx();
if (!ShStrNdxOrErr || (*ShStrNdxOrErr == ELF::SHN_XINDEX && RealShNum == 0)) {
consumeError(ShStrNdxOrErr.takeError());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be consuming the ShStrNdxOrErr here. Indeed, I'd be somewhat surprised if this didn't cause a failure in some configurations, if there wasn't actually an Error in the Expected (because the value is set to 0).

Also, the second half of the if isn't technically correct, since the section header 0 section might have sh_size 0 but sh_link 0xffff (in a dodgy ELF). This is probably a case where referencing the original e_shstrndx from the header makes sense.

I think the code should look like something this:

Expected<uint32_t> ShStrNdxOrErr = getShStrNdx();
if (!ShStrNdxOrErr)
  return ShStrNdxOrErr.takeError();

if (getHeader().e_shstrndx == SHN_XINDEX && Sections.empty())
  return createError(...);

I'm assuming that Sections.empty() works fine here, since that's what the old code did, but if not, due to other changes we've made, I'd be okay with *RealShStrNdx == 0 in place of it.

NB: there might be some edge cases where the error message ends up being different to the old code. In this specific case, I'm okay with that, as long as we have test coverage for the existing error produced by this function, when there are no sections.

Header.e_shoff != 0) {

// Pretend we have section 0 or sections() would call getShNum and thus
// become an infinite recursion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// become an infinite recursion
// become an infinite recursion.

Nit: comments should be full sentences, with trailing full stops.

if ((Header.e_phnum == ELF::PN_XNUM || Header.e_shnum == 0 ||
Header.e_shstrndx == ELF::SHN_XINDEX) &&
Header.e_shoff != 0) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't start blocks with a blank line.

@aokblast
Copy link
Contributor Author

Sorry for the late reply and thanks for continously reviewing my patches!
Some style nit are really hard for me to get rid of as a developer of other projects.

Thanks for the updates. This is looking mostly good. I've got a few remaining small comments.

We'll need to audit the existing code within LLVM, to ensure people aren't doing the section 0 reading themselves still, but those can probably be left until future PRs, if the tests are all passing.

Sure. It is now in my TODO list. Thanks for your advise! I would check that after the readobj patch is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants